Overview

Dataset statistics

Number of variables15
Number of observations13414
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.5 MiB
Average record size in memory120.0 B

Variable types

NUM8
CAT5
BOOL2

Reproduction

Analysis started2020-07-19 13:25:40.909284
Analysis finished2020-07-19 13:25:55.069785
Duration14.16 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

df_index has unique values Unique
employee_id has unique values Unique

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count13414
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7075.366930073058
Minimum2
Maximum14120
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum2
5-th percentile713.65
Q13554.25
median7084.5
Q310600.75
95-th percentile13409.35
Maximum14120
Range14118
Interquartile range (IQR)7046.5

Descriptive statistics

Standard deviation4074.937248
Coefficient of variation (CV)0.5759329924
Kurtosis-1.199513238
Mean7075.36693
Median Absolute Deviation (MAD)3523.5
Skewness-0.005876662528
Sum94908972
Variance16605113.58
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
7731< 0.1%
 
88971< 0.1%
 
109441< 0.1%
 
47911< 0.1%
 
68381< 0.1%
 
6931< 0.1%
 
27401< 0.1%
 
129791< 0.1%
 
88811< 0.1%
 
Other values (13404)1340499.9%
 
ValueCountFrequency (%) 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
ValueCountFrequency (%) 
141201< 0.1%
 
141191< 0.1%
 
141181< 0.1%
 
141171< 0.1%
 
141161< 0.1%
 

age
Real number (ℝ≥0)

Distinct count36
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.864768152676305
Minimum22
Maximum57
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum22
5-th percentile22
Q124
median29
Q341
95-th percentile52
Maximum57
Range35
Interquartile range (IQR)17

Descriptive statistics

Standard deviation9.95866612
Coefficient of variation (CV)0.3030195154
Kurtosis-0.8537566217
Mean32.86476815
Median Absolute Deviation (MAD)6
Skewness0.7075608107
Sum440848
Variance99.1750309
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2412319.2%
 
2511888.9%
 
2311308.4%
 
2211168.3%
 
276484.8%
 
296264.7%
 
286174.6%
 
266014.5%
 
422922.2%
 
372662.0%
 
Other values (26)569942.5%
 
ValueCountFrequency (%) 
2211168.3%
 
2311308.4%
 
2412319.2%
 
2511888.9%
 
266014.5%
 
ValueCountFrequency (%) 
57330.2%
 
56230.2%
 
55360.3%
 
542141.6%
 
532201.6%
 

avg_monthly_hrs
Real number (ℝ≥0)

Distinct count249
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200.05524079320114
Minimum49.0
Maximum310.0
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum49
5-th percentile128
Q1155
median199
Q3245
95-th percentile275
Maximum310
Range261
Interquartile range (IQR)90

Descriptive statistics

Standard deviation50.85013648
Coefficient of variation (CV)0.2541804767
Kurtosis-1.040765486
Mean200.0552408
Median Absolute Deviation (MAD)45
Skewness0.009066470968
Sum2683541
Variance2585.73638
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1561351.0%
 
1351351.0%
 
1511331.0%
 
1491321.0%
 
1601200.9%
 
1431160.9%
 
1571130.8%
 
1451130.8%
 
2571130.8%
 
1481120.8%
 
Other values (239)1219290.9%
 
ValueCountFrequency (%) 
493< 0.1%
 
521< 0.1%
 
542< 0.1%
 
551< 0.1%
 
561< 0.1%
 
ValueCountFrequency (%) 
310120.1%
 
309150.1%
 
308160.1%
 
307140.1%
 
306170.1%
 

department
Categorical

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
D00-SS
3896
D00-ENG
2575
D00-SP
2109
D00-IT
1359
D00-PD
853
Other values (6)
2622
ValueCountFrequency (%) 
D00-SS389629.0%
 
D00-ENG257519.2%
 
D00-SP210915.7%
 
D00-IT135910.1%
 
D00-PD8536.4%
 
D00-MT8126.1%
 
D00-FN7225.4%
 
D00-MN5904.4%
 
D00-AD1751.3%
 
D00-PR1731.3%
 

Length

Max length7
Median length6
Mean length6.19196362
Min length6

employee_id
Real number (ℝ≥0)

UNIQUE

Distinct count13414
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean112114.01654987327
Minimum100102
Maximum148988
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum100102
5-th percentile101230.65
Q1105786.75
median111288.5
Q3116637.75
95-th percentile127877.45
Maximum148988
Range48886
Interquartile range (IQR)10851

Descriptive statistics

Standard deviation8481.397727
Coefficient of variation (CV)0.0756497536
Kurtosis2.768847548
Mean112114.0165
Median Absolute Deviation (MAD)5429
Skewness1.304940316
Sum1503897418
Variance71934107.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1023981< 0.1%
 
1113471< 0.1%
 
1174041< 0.1%
 
1030631< 0.1%
 
1051101< 0.1%
 
1010121< 0.1%
 
1112511< 0.1%
 
1132981< 0.1%
 
1071531< 0.1%
 
1092001< 0.1%
 
Other values (13404)1340499.9%
 
ValueCountFrequency (%) 
1001021< 0.1%
 
1001031< 0.1%
 
1001061< 0.1%
 
1001071< 0.1%
 
1001081< 0.1%
 
ValueCountFrequency (%) 
1489881< 0.1%
 
1489471< 0.1%
 
1489161< 0.1%
 
1488791< 0.1%
 
1488421< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
0
11459
1
 
1955
ValueCountFrequency (%) 
01145985.4%
 
1195514.6%
 

gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
Male
8823
Female
4591
ValueCountFrequency (%) 
Male882365.8%
 
Female459134.2%
 

Length

Max length6
Median length4
Mean length4.684508722
Min length4

last_evaluation
Real number (ℝ≥0)

Distinct count11573
Unique (%)86.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7183358151185328
Minimum0.316175
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum0.316175
5-th percentile0.45923915
Q10.57969775
median0.718399
Q30.8564885
95-th percentile0.9761184
Maximum1
Range0.683825
Interquartile range (IQR)0.27679075

Descriptive statistics

Standard deviation0.1635715398
Coefficient of variation (CV)0.2277090134
Kurtosis-0.9829227857
Mean0.7183358151
Median Absolute Deviation (MAD)0.1385405
Skewness-0.0665464082
Sum9635.756624
Variance0.02675564862
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.718399142110.6%
 
13422.5%
 
0.8962463< 0.1%
 
0.5058833< 0.1%
 
0.9782482< 0.1%
 
0.7448342< 0.1%
 
0.5363422< 0.1%
 
0.7079092< 0.1%
 
0.6957682< 0.1%
 
0.5571272< 0.1%
 
Other values (11563)1163386.7%
 
ValueCountFrequency (%) 
0.3161751< 0.1%
 
0.3172791< 0.1%
 
0.3209531< 0.1%
 
0.3228281< 0.1%
 
0.3242391< 0.1%
 
ValueCountFrequency (%) 
13422.5%
 
0.9998081< 0.1%
 
0.999391< 0.1%
 
0.9993651< 0.1%
 
0.9992591< 0.1%
 

marital_status
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
Unmarried
6872
Married
6542
ValueCountFrequency (%) 
Unmarried687251.2%
 
Married654248.8%
 

Length

Max length9
Median length9
Mean length8.024601163
Min length7

n_projects
Real number (ℝ≥0)

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.783584314894886
Minimum1.0
Maximum7.0
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.250445243
Coefficient of variation (CV)0.3304922367
Kurtosis-0.4910589144
Mean3.783584315
Median Absolute Deviation (MAD)1
Skewness0.304845266
Sum50753
Variance1.563613305
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4382928.5%
 
3358726.7%
 
5247618.5%
 
2219016.3%
 
610427.8%
 
72291.7%
 
1610.5%
 
ValueCountFrequency (%) 
1610.5%
 
2219016.3%
 
3358726.7%
 
4382928.5%
 
5247618.5%
 
ValueCountFrequency (%) 
72291.7%
 
610427.8%
 
5247618.5%
 
4382928.5%
 
3358726.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
0
13132
1
 
282
ValueCountFrequency (%) 
01313297.9%
 
12822.1%
 

salary
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
low
6572
medium
5745
high
 
1097
ValueCountFrequency (%) 
low657249.0%
 
medium574542.8%
 
high10978.2%
 

Length

Max length6
Median length4
Mean length4.366631877
Min length3

satisfaction
Real number (ℝ≥0)

Distinct count12833
Unique (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6221066565081257
Minimum0.0400584
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum0.0400584
5-th percentile0.13766635
Q10.454246
median0.649746
Q30.82385025
95-th percentile0.9690812
Maximum1
Range0.9599416
Interquartile range (IQR)0.36960425

Descriptive statistics

Standard deviation0.2490957502
Coefficient of variation (CV)0.4004068235
Kurtosis-0.6405903594
Mean0.6221066565
Median Absolute Deviation (MAD)0.1837815
Skewness-0.4827892417
Sum8344.93869
Variance0.06204869279
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13272.4%
 
0.6212121501.1%
 
0.7648732< 0.1%
 
0.8447692< 0.1%
 
0.5668862< 0.1%
 
0.5746092< 0.1%
 
0.6074982< 0.1%
 
0.5568412< 0.1%
 
0.9388772< 0.1%
 
0.7541572< 0.1%
 
Other values (12823)1292196.3%
 
ValueCountFrequency (%) 
0.04005841< 0.1%
 
0.04047741< 0.1%
 
0.04130171< 0.1%
 
0.04240751< 0.1%
 
0.04484411< 0.1%
 
ValueCountFrequency (%) 
13272.4%
 
0.999881< 0.1%
 
0.9997631< 0.1%
 
0.9997041< 0.1%
 
0.9995931< 0.1%
 

status
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size104.8 KiB
Employed
10261
Left
3153
ValueCountFrequency (%) 
Employed1026176.5%
 
Left315323.5%
 

Length

Max length8
Median length8
Mean length7.059788281
Min length4

tenure
Real number (ℝ≥0)

Distinct count8
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.498956314298494
Minimum2.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size104.8 KiB

Quantile statistics

Minimum2
5-th percentile2
Q13
median3
Q34
95-th percentile6
Maximum10
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.465920052
Coefficient of variation (CV)0.4189592324
Kurtosis4.874377644
Mean3.498956314
Median Absolute Deviation (MAD)1
Skewness1.885493748
Sum46935
Variance2.148921598
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3584243.6%
 
2286421.4%
 
4227417.0%
 
512899.6%
 
66184.6%
 
101981.5%
 
71801.3%
 
81491.1%
 
ValueCountFrequency (%) 
2286421.4%
 
3584243.6%
 
4227417.0%
 
512899.6%
 
66184.6%
 
ValueCountFrequency (%) 
101981.5%
 
81491.1%
 
71801.3%
 
66184.6%
 
512899.6%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexageavg_monthly_hrsdepartmentemployee_idfiled_complaintgenderlast_evaluationmarital_statusn_projectsrecently_promotedsalarysatisfactionstatustenure
0222156.0D00-SS1125861.0Female0.474082Unmarried2.00.0medium0.405101Left3.0
1336256.0D00-SP1080710.0Male0.961360Married6.00.0low0.152974Left4.0
2438146.0D00-SS1169150.0Male0.507349Married2.00.0medium0.434845Left3.0
3522135.0D00-MT1045550.0Male0.482184Unmarried2.00.0low0.381545Left3.0
4651270.0D00-PD1047060.0Male0.867087Married6.00.0low0.172575Left4.0
5754244.0D00-IT1185360.0Male0.926197Married6.00.0medium0.061868Left5.0
6843289.0D00-IT1117120.0Male0.929858Married7.00.0low0.161744Left4.0
7949281.0D00-SS1191500.0Male0.907965Married6.00.0medium0.105749Left4.0
81037269.0D00-SP1091620.0Male0.867086Married6.00.0low0.121133Left4.0
91127267.0D00-SP1052510.0Male0.953585Unmarried5.00.0low0.871310Left6.0

Last rows

df_indexageavg_monthly_hrsdepartmentemployee_idfiled_complaintgenderlast_evaluationmarital_statusn_projectsrecently_promotedsalarysatisfactionstatustenure
134041411129221.0D00-SP1007361.0Male0.693354Unmarried4.00.0low0.789367Employed3.0
134051411224257.0D00-SS1188890.0Female0.705896Unmarried3.00.0low0.739094Employed3.0
134061411339196.0D00-PD1399960.0Female0.827836Married4.00.0low0.543141Employed5.0
134071411425262.0D00-PD1124250.0Male0.765994Unmarried5.00.0medium0.741471Employed2.0
134081411550258.0D00-ENG1220320.0Female0.899923Married7.00.0low0.080220Left4.0
134091411633141.0D00-SS1166260.0Male0.537866Married3.00.0low0.610841Employed3.0
134101411753168.0D00-SS1122620.0Male0.643553Married3.00.0low0.489559Employed3.0
134111411824257.0D00-SP1089220.0Male0.718399Unmarried3.00.0medium0.944942Employed3.0
134121411933242.0D00-IT1135390.0Male0.836603Married4.00.0low0.740136Employed2.0
134131412046171.0D00-SP1052861.0Male0.907277Married3.00.0low0.506658Employed3.0